feat(ribocode,ribotish): pyfasta indexes, prefix-scoped outputs, optional ribotish -a#11684
Closed
pinin4fjords wants to merge 1 commit into
Closed
feat(ribocode,ribotish): pyfasta indexes, prefix-scoped outputs, optional ribotish -a#11684pinin4fjords wants to merge 1 commit into
pinin4fjords wants to merge 1 commit into
Conversation
…onal ribotish -a Bundles three in-place module changes carried in nf-core/riboseq#174. ribocode/prepare: pre-build the pyfasta .gdx/.flat indexes for annotation/transcripts_sequence.fa using the same key_fn RiboCode applies internally (split on first space, else split on '|'). Downstream RiboCode tasks otherwise lazily build those sidecars inside the staged input directory, which fails under Fusion staging because writes leak back to the upstream task's S3 prefix. ribocode/ribocode: scope the orf_txt and orf_txt_collapsed output globs to ${prefix}.txt and ${prefix}_collapsed.txt rather than *.txt/*_collapsed.txt so multi-instance publication is unambiguous. The prefix binding is promoted out of `def` in both the script and stub blocks so it resolves at the output-glob stage (Nextflow 26 strict parser rejects redeclaration of the same name across script/stub if either uses `def`). The existing stub-test assertion that indexed orf_txt[0][1][0] is adjusted to the new single-file shape. ribotish/predict: extend the fasta/gtf input tuple with an optional fourth path, reference_gtf, plumbed to ribotish predict as `-a <gtf>` when populated. BREAKING signature change for callers: every emitter must supply a fourth element in the third tuple (use `[]` for the no-op case). Source: nf-core/riboseq#174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 18, 2026
Member
Author
|
Superseded by the per-module splits:
Closing this bundled draft. Branch |
maxulysse
pushed a commit
to maxulysse/nf-core_modules
that referenced
this pull request
May 19, 2026
…-core#11685) * feat(ribocode): pre-build pyfasta indexes + prefix-scoped outputs Two related changes carried in nf-core/riboseq#174 and split out of the bundled PR nf-core#11684. ribocode/prepare: pre-build the pyfasta `.gdx`/`.flat` indexes for `annotation/transcripts_sequence.fa` immediately after `prepare_transcripts`, using the same `key_fn` RiboCode applies internally (split on first space, otherwise split on `|`). Stub touches the two new sidecars. Why: downstream RiboCode steps open the FASTA with pyfasta, which lazily writes `.gdx`/`.flat` next to the input on first read. Under Fusion staging those writes land back at the upstream task's S3 prefix and silently corrupt the staged copy on retries. Building the indexes inside the producing task fixes it. ribocode/ribocode: switch the `orf_txt` and `orf_txt_collapsed` output globs from `*.txt` / `*_collapsed.txt` to `${prefix}.txt` / `${prefix}_collapsed.txt` so multi-instance publication is unambiguous (`*.txt` previously matched both files into the same emit). The `prefix` binding is promoted out of `def` in both `script:` and `stub:` so it resolves at the output-glob stage; the Nextflow 26 strict parser rejects re-declaring the same local with `def` across both blocks. The existing stub assertion at `process.out.orf_txt[0][1][0]` is corrected to the new single-file shape (`process.out.orf_txt[0][1]`). Source: nf-core/riboseq#174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ribocode/prepare): reframe pyfasta pre-build comment The lazy pyfasta sidecar write isn't Fusion-specific - it's a Nextflow symlink-staging concern that affects any backend (writes leak back to the producer task's work dir via the staged-input symlink). Rewording the inline comment to match. No code change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ribocode/prepare): use RiboCode's GenomeSeq for pyfasta pre-build Replace the inline 8-line python heredoc (which replicated RiboCode's `get_chrom` key_fn verbatim) with a single `python -c` line that imports and instantiates `RiboCode.prepare_transcripts.GenomeSeq` directly. The class constructor itself runs `Fasta(filename, key_fn=get_chrom)` with the same key function, so we drop the replication while producing byte-identical .gdx/.flat sidecars (md5-verified on the realistic FASTA format prepare_transcripts emits). No snapshot change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
maxulysse
pushed a commit
to maxulysse/nf-core_modules
that referenced
this pull request
May 19, 2026
…s + 0.2.8 (nf-core#11686) * feat(ribotish/predict): add optional secondary reference GTF for -a Carried in nf-core/riboseq#174 and split out of the bundled PR nf-core#11684. **Breaking signature change.** The third input tuple gains an optional fourth element, `reference_gtf`, plumbed through to `ribotish predict` as `-a <gtf>` when populated: tuple val(meta3), path(fasta), path(gtf), path(reference_gtf, stageAs: 'secondary.gtf') Callers must supply a fourth element on every emit. Pass `[]` for the no-op case (no secondary annotation). The existing test cases in this PR are migrated that way; positive-coverage tests for the populated path will land in a follow-up. Why: Ribo-TISH's `-a` argument is the documented hook for layering a secondary annotation (e.g. MANE/RefSeq) on top of the primary GTF, and we want to expose it from the module without a second optional input tuple. Source: nf-core/riboseq#174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ribotish/predict): optional reference_gtf in its own tuple + topics versions + bump 0.2.8 Three coupled cleanups in response to the lint feedback on PR nf-core#11686: 1. Move the new `reference_gtf` input out of the existing fasta/gtf tuple and into its own optional input tuple (meta7) - the convention this module already uses for `bam_ti`, `candidate_orfs`, `para_ribo`, and `para_ti`. The existing `(meta3, fasta, gtf)` signature is preserved, so callers no longer need to grow that tuple; they wire a separate `Channel.of([[], []])` (or a populated channel) into the new slot. 2. Migrate version reporting from the legacy `versions.yml` heredoc to the new topic-based emission (`tuple val("${task.process}"), val('ribotish'), eval('...'), topic: versions, emit: versions_ribotish`). The `versions.yml` heredoc is removed from both `script:` and `stub:`. `meta.yml` regenerated by `nf-core modules lint --fix` to add the `topics:` block and reshape the `versions_ribotish` output entry. 3. Bump ribotish from 0.2.7 to 0.2.8 (bioconda; build hash unchanged). Test snapshot regenerated under `--update`: versions snapshot key renamed from `versions_*` to `versions_ribotish_*`, version string updated to `0.2.8`. Prediction-table assertions unchanged - 0.2.8 is a patch release. Source: nf-core/riboseq#174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ribotish/predict): consolidate to one unnamed snapshot per test Per SPPearce's review comment on nf-core#11686: each test should have a single anonymous snapshot() call rather than multiple named ones. Non-stub tests roll `transprofile` + the topic-versions findAll into one snapshot; the existing `predictions` / `all` contains() row checks are kept as separate assertions (they pin specific known-good output rows and aren't redundant with the snapshot). Stub tests roll `predictions` + `all` + `transprofile` + versions into one snapshot. Versions are referenced via the canonical `process.out.findAll { key, val -> key.startsWith('versions') }` pattern (653 modules in nf-core/modules use it vs 53 with explicit `process.out.versions_<tool>`). Snapshot keys are now the test names directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bundles three in-place module changes carried in nf-core/riboseq#174. Each is self-contained and addresses a different pain point we hit running RiboCode / Ribo-TISH at scale.
ribocode/prepare
Pre-build the pyfasta
.gdx/.flatindexes forannotation/transcripts_sequence.faimmediately afterprepare_transcripts, using the samekey_fnRiboCode applies internally (split on first space, otherwise split on|). The stub also touches the two new sidecars.Why: downstream RiboCode steps open the FASTA with pyfasta, which lazily writes
.gdx/.flatnext to the input on first read. Under Fusion staging those writes land back at the upstream task's S3 prefix and silently corrupt the staged copy on retries. Building the indexes inside the producing task fixes it.ribocode/ribocode
Switch the
orf_txtandorf_txt_collapsedoutput globs from*.txt/*_collapsed.txtto${prefix}.txt/${prefix}_collapsed.txtso multi-instance publication is unambiguous (*.txtpreviously matched both the all-ORFs and collapsed files into the same emit). Theprefixbinding is promoted out ofdefin bothscript:andstub:so it resolves at the output-glob stage; the Nextflow 26 strict parser rejects re-declaring the same local withdefacross the two blocks.The existing stub assertion that indexed
process.out.orf_txt[0][1][0]is corrected to the new single-file shape (process.out.orf_txt[0][1]).ribotish/predict
Breaking signature change. The third input tuple gains an optional fourth element,
reference_gtf, plumbed through toribotish predictas-a <gtf>when populated:Callers must supply a fourth element on every emit. Pass
[]for the no-op case (no secondary annotation). The existing test cases in this PR are migrated that way; positive-coverage tests for the populated path will land in a follow-up.Why: Ribo-TISH's
-aargument is the documented hook for layering a secondary annotation (e.g. MANE/RefSeq) on top of the primary GTF, and we want to expose it from the module without a second optional input tuple.Test plan
All three modules pass under Docker on a
c5.9xlargeVM withnf-core 4.0.2/nextflow 26.04.1/nf-test 0.9.5:Snapshot deltas:
ribocode/prepare: non-stub snapshot gains the two new file md5s (transcripts_sequence.fa.flat,transcripts_sequence.fa.gdx); existing files' md5s unchanged.ribocode/ribocode:orf_outputssnapshot drops the duplicatetest_collapsed.txtentry that the old*.txtglob had pulled intoorf_txt; everything else unchanged.ribotish/predict: no snapshot change (the[]migration is a no-op at runtime).Source: nf-core/riboseq#174